Recent Changes - Search:

Home & News

Organization

Assignments

Support

edit SideBar

Lab2

General Remarks

  • Group work is NOT allowed in the lab. You have to work alone. Discussions with colleagues (e.g., in the forum) are allowed but the code has to be written alone.
  • Be sure to check the Tricky Parts section for questions!
  • Further reading is provided under tutorials
  • In order to guarantee that all students can work fluently, do not waste resources on the lab server (i.e., do not let your RMI servers run unnecessarily, kill your running zombie processes,...)

Submission Guide

Submission

  • You must upload your solution using the Teaching Tool before the submission deadline: 07.12.2007, 18:00 cet. - the deadline is hard!
  • You are responsible for submitting your solution in time! Please make sure that your upload was successful (i.e., you should be able to download your solution in the teaching tool - as the tutors will do during the interview). If you do not submit, you won't get any points!
  • You have to upload your solution as a ZIP file. Please submit only the sources of your solution (not the compiled class files and no third-party libraries).
  • Before the submission deadline, you can upload your solution as often as you like. Remember that only your last submission is taken into consideration!
  • Your submission must compile and run in our lab environment. Please test before you submit.
  • Please provide ant-tasks to ease testing. You may use our ant template.

Interviews

  • After the submission deadline, there will be a mandatory interview (Abgabegespräch). You must register for a time slot to the interviews using the Teaching Tool.
  • You can do the interview only if you have submitted your solution before the deadline!
  • The interviews start two weeks before the main "interview" - week. If you pass before 05.12.2007 you will earn 2 bonus points!
  • The interview will take place in the DSLab. During the interview, you will be asked about the solution that you have uploaded (i.e., changes after the deadline will not be taken into account!). In the interview you need to explain your code, design and architecture in detail.
  • Remember that you can do the interview only once!

Description

In this assignment you will learn:

  • the basics of a simple distributed object technology (RMI)
  • how to lookup objects with a naming service
  • how to implement a callback with RMI
  • how to implement a replication mechanism based on RMI

This year you shall use this features to implement a simple(!) distributed database system.

Overview
Replication is a formidable means to either increase the availability or the performance of a system. We use replication to store the data of the database in several files to increase its availability.

To access the replica's in a consistent and transparent manner you have to build a facade class that accesses the replica's via the RMI protocol. This facade class shall offer a method for each function of the database. This facade class is directly instantiated within your testing code and clients.

Database functionality

You shall realize simple database functionality that stores tables and records (inside files). The focus of this lab is NOT on the database functionality. Hence it suffices if the functionality is implemented in a simple way.

The required functionality is listed below. In addition to these descriptions we also provide similar SQL statements (the function description resemble SQL commands - however, you need NOT support SQL syntax, you just have to provide appropriate interfaces that can be called by developers).

  • Create Table creates a new table. A table in our database stores various records. Multiple tables with different names shall be supported. A table has multiple columns that store character strings. Each column has a name and a length, ie the number of characters that can be stored. You do not need to take care if columns are used as key columns. You can assume that the columnname is as small as the number of characters that can be stored. The order of the columns is not important. The columnname will only need to take alphanumeric characters. You need not consider storing newlines or tabulators in the columns (as well as the column name). You need not check if non-alphanumeric characters are provided as arguments. If you want to get all points for this lab assignment you should consider storing a version number (of the table data) as you need it in the replication part.
  • Insert Into Table inserts data into a table. The user has to provide sufficient string arguments for all columns (you can also provide an associative data structure such as Hashtables that take combinations of columnnames and new-values).
  • Select From Table selects data from a table. Your implementation of this function shall take one columnname-columnvalue pair as a parameter to identify records within the table: all records with an entry that equals the given value in the identified column shall be returned. If a special column name is provided (eg.:*) then all records shall be returned.
  • Update Table updates a particular record. As the select function takes one columnname-columnvalue pair as a parameter to identify the records that shall be updated. The remaining arguments required by your update function identify only new values of the identified records. The other columns' values shall not be modified (of course these values can be rewritten with the previous values).
  • Delete From Table deletes all records that are identified by one columnname-columnvalue pair.

There are different possibilities how you store and access the database data:

  • you can use Java serialization to store the table's data in a single file. The advantage is that this is very easy to implement. The disadvantage is that serialized data does not typically produce human readable files. Hence, you shall consider to overwrite the toString method to debug your implementation.
  • you can use Java's RandomAccessFile (java.io.RandomAccessFile) class to do the access. This produces files that have a fixed-length record structure. This requires more implementation effort than the serialization variant. However, the advantage is that the files may be human-readable. The RandomAccessFile class provides a method seek that allows to move a file pointer to a certain position. This position can be calculated by considering the column lengths and the record number. Typically the first line of the record shall contain information about the structure (ie, columnnames and column lenghts).
  • principally other persistence mechanisms are possible, too (eg. reading a file stream and writing the whole stream, ...). However, you should stick to the two possibilities mentioned above.
  • support to store the data of a particular database instance in its own directory. You can parameterize this with a constructor parameter. You will need this in the second part of the assignment.

You have to check only the following cases for wrong user input and throw an exception.

  • invalid table name: only alphanumeric characters are allowed for a table name, or an unknown table is accessed. Why do we restrict this to alphanumeric characters? Regardless how you store the table's data it will be stored in the file system. The easiest way to do this is to use files that have the same name as the table. Table names that include forward slashes or other control characters may be problematic when creating or opening a file.
  • invalid column name: for select, update, delete a criteria has to be provided. If an invalid column for this criteria is provided then throw an exception.
  • invalid data: any case that new data is invalid (column data is longer than the maximum length, invalid column names, data is not provided for all columns in the insert, ...)

You shall implement your own Exception class. However, we recommend that you use an unchecked exception, ie. the exception class extends the RuntimeException class. These exceptions are not checked by the Java compiler and you do not need an appropriate throws clause. The advantage of these unchecked exceptions is that the implementation code can throw any exception without requiring the client code know every potential exception.

We highly recommend that you build a test client to verify the functionality of your database.

Build a Remote Database

The general idea is to make your database remotely accessible via RMI (Remote Method Invocation).

  • Design an remote interface, ie include appropriate methods to make the database functionality accessible by remote clients.
  • Implement also an RMI (Remote Method Invocation) server to implement the remote interface. You can either implement the database functionality directly within the RMI server, or you can implement an additional server class that accesses the database functionality inside another class.
  • Your server shall be started with a particular parameter that identifies the server. Register this instance of your RMI server within an RMI registry with the identification parameter.

You find all required information about RMI in the Java Remote Method Invocation tutorial. The link is given below. If you follow this tutorial you should be able to make your database remoteable.

Some additional hints:

  • When you test your implementation at the server you have to bind your server to the global rmi registry with a uniquely defined name. To not interfer with other students you should bind to the registry with something like 'DBServerXXX-Y' whereas XXX is your lab id, y is a running number you can assign yourselves.
  • Start one or multiple implementations and register them at the registry.
  • Consider also to unbind from the registry when you stopping your process.

It is mandatory that you build a test client that verifies all the functions that are available through your remote interface. This means: build a JUnit test client that has a test method for each of the available remote interface methods.

Callback

Define an additional interface that acts as a notification callback, ie. its single method gets called whenever a database change happens (you don't need to consider creation of a table). In Java classes that implement such notification callbacks are called listeners - you can find more information about RMI Callbacks in the link below. The single method shall be called with the table name, the cause of the database change (delete,insert,update), and the modified data.

Enhance the remote interface for the database access to add and remove such listeners (more than one listener may be registered at the same time) for a particular table. Whenever a change happens to the database call the notification method of ALL registered listeners.

To let the listener work in a remote setting you have to:

  • build an implementation class that implements your callback interface. Think about the necessary action to let the JUnit test client verify that the appropriate notification has been called. As a debugging aid you may output the notification data to the console.
  • export an instance of this implementation class. You don't need to register this instance at the RMI registry (and you should not do this). Unexport the instance of the listener's implementation class when it is no longer needed.
  • add the listener instance to your remote database access via your enhanced remote interface.

Enhance the test client that shows that a modification of a database accessible via RMI leads to an invocation of an RMI-exported listener. Doing these tests locally without RMI is nice but will not give you the points.

Replication

In this second part you have to build a replicated database server. Gifford's scheme shall be used to keep the access to the replicas consistent. To support this version numbers shall be assigned to data files (exactly one version number for one table).

Gifford's scheme works as follows: To read a file that has N replicas a so-called read quorum is assembled, i.e. a subset of NR servers among these N replicas. These NR may be chosen arbitrarily. To modify a file a write quorum of NW servers is built in the same way as the read quorum.

The two values NR and NW need to satisfy two constraints:

  1. NR + NW > N
  2. NW > N/2

The following figure gives an example of Gifford's scheme. The first three servers have been selected as the write quorum, the last two servers have been selected as a read quorum.

Only if the above constraints are satisfied then at least one server of the read quorum will lead to recent data. Similarly, if the constraints are satisfied then always one write server will have recent data.

You shall build a client-side facade class that realizes this Quorum Based replication scheme. This class is initialized with a set of N replicas (these might be already looked-up RMI proxies, but these might just be the names of the RMI services, the lookup for the RMI proxies has then to be done later), and the number NR and NW. Verify that the constraints described above are satisfied. In case of a contradition react accordingly, ie throw an exception.

For each database access either a read quorum or a write quorum (or both) are created: select randomly (i.e. with a random number generator) the appropriate subset of servers accordingly. This subset shall be selected at each read or write request.

For reading a replicated value all servers of the read quorum need to be contacted. The data of the server with the highest version number is chosen. When data gets updated/written all servers in the write quorum are asked for the version number of the data. The data of all the servers in the write quorum are updated. The new version number equals the version number of the server in the write quorum with the highest version number plus 1. Important Remark: there is only one version number for each table. If you update data of a server with a smaller version from a server with a higher version then you have to replicate ALL records from the server with the higher version. This constraint needs always be valid: identical version numbers implies identical data.

Test Client

For testing (and demonstrating) that your distributed database works you shall start multiple RMI servers that provide access to an instance of your database. These database instances shall write to different file system locations. To test your implementation you shall write one (or multiple if you like) JUnit test classes that include:

  • a test method that creates a new table in each of your distributed databases. Fill it with some test data (the same data in each replicated database). You can hardcode the names of your databases.
  • a test method for each of the select,insert,update,delete functionalities that connects to one (or all) of your remote database instances and use the corresponding functionality. Use the select functionality to verify if data has really changed.
  • write a test method to demonstrate that changing the database invokes the callback. You can easily show this by letting the listener class store the change cause and the changed data.

To test the implementation of Gifford's scheme write following separate test class:

  • connect to a fixed number of servers and create one instance of the facade class that realizes the quorum based access with a correct initialization of NR and NW (select appropriate numbers for NR and NW).
  • write test cases for the replicated insert,update and delete methods. Show that the newly updated data is available in just NW of all server instances by connecting to these server instances directly (instead of the facade class). This can easily be done if you have remote methods to retrieve the version number.

Hints Tricky Parts

Following hints will be useful when dealing with RMI:

  • If you work at home you have to start your own rmiregistry (the command is called 'rmiregistry').
  • Please conform to the suggested naming scheme above: DBServerXXX-Y, whereas XXX is your lab id, and Y is a number (or character) you can assign to your server.
  • You have to start your server with setting the java.rmi.server.codebase property to a classpath where your compiled remote interfaces reside. Java Properties can be set on java startup with java -Dpropertyname=value.
  • Typically the codebase has to be set with an absolute path starting with the url prefix file:///classpath, eg. java.rmi.server.codebase=file:///home/users/user123/lab2/build/
  • If the codebase refers to a directory there must be a final "/" character.
  • If you are using Windows providing the codebase is a little bit tricky because of the backslashes and spaces inside names (the same is valid if you use spaces in Unix file names): first you have to convert all backslashes to forward slashes. Second you have to replace all space characters with %20. For instance: file:///C:/Documents%20and%20Settings/...
  • RMI servers will normally block until the process is killed. If you use ant to start your server(s) then you must include a fork attribute in the java command. Otherwise the server process will be killed when ant finishes processing the build.xml file.
  • For using the Java 5 automatic proxy generation feature you must not use the static exportObject method (inside class UnicastRemoteObject) that returns java.rmi.server.RemoteStub. Instead you must use the exportObject that returns java.rmi.Remote. If you use the wrong method then the automatic proxy generation facility will NOT work. If you provide 0 for the port then a free port is assigned to your server.
  • you may also extend your server class directly from UnicastRemoteObject. Then you don't need to export your object directly. However the general disadvantage of this approach is that you waste the luxury of single inheritance in Java to implementing RMI (via UnicastRemoteObject) and can't have a super class that is specific to the domain of the application (eg. Distributed Databases in our example). For the lab you can choose any of the two options.
  • there is only one version number for each table. If you update data of a server with a smaller version from a server with a higher version then you have to replicate ALL records (not just the one being updated,inserted,deleted) from the server with the higher version.

This constraint needs always be valid: identical version numbers implies identical data.

  • if you do not use Java serialization deletion of records can be easily implemented when you mark the record with a special symbol (that means this record is deleted) because you do not have to rearrange the whole file.

Ant Template

For Lab 2 you can extend this Attach: build.xml that supports also Junit. The task definition entry in this file is appropriate for the combination of ant-1.6.x and JUnit as installed on the lab servers pasta and pizza. In addition you will need to include static method suite() in your JUnit test classes:

import org.junit.*;

public class TestClassX {

  ... test methods

  public static junit.framework.Test suite() {return new junit.framework.JUnit4TestAdapter(TestClassX.class);}
}

(If you use ant-1.7 (at your home PCs) you do not need to include this method. However, if you want to start the tests in the lab you should include it).

Important remark: the build.xml file does not include targets to start instances of your servers.

Proposed Tutorials

Java Remote Invocation (RMI) Tutorial

RMI Callbacks

Proposed Literature

Gifford's scheme is described in Tanenbaum's Distributed Systems Books - Principles and Paradigms in the Section about Replicated-Write Protocols.

Edit - History - Print - Recent Changes - Search
Page last modified on November 13, 2007, at 01:40 PM CET